-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1533: [JAVA] realloc should consider the existing buffer capacity for computing target memory requirement #1097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Since we have a 0.7.0 incoming shortly, I want to make sure that this is not a blocker. We can always release 0.7.1 with such bug fixes shortly thereafter. |
jacques-n
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change makes sense. We should make sure the allocation is a power of two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make sure this is a power of 2.
|
addressed review comments |
6bf1b8a to
9b541b0
Compare
…computing target memory requirement
9b541b0 to
66620f1
Compare
|
Added unit tests. |
|
Without the code-changes, the unit test fails with segmentation fault (as described in the problem above) Tests run: 21, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.209 s <<< FAILURE! - in org.apache.arrow.vector.TestValueVector |
| */ | ||
|
|
||
| @Test | ||
| public void testReallocAfterVectorTransfer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests for a NullableBitVector and NullableVarCharVector? Does MapVector or ListVector also need a test?
| baseSize = (long)currentBufferCapacity; | ||
| } | ||
| long newAllocationSize = baseSize * 2L; | ||
| newAllocationSize = Long.highestOneBit(newAllocationSize - 1) << 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use BaseAllocator.nextPowerOfTwo()
|
@jacques-n , added tests for fixed-width vector, variable width vector and bit vector. To further test correctness, I verified that without the code changes, unit tests fail with expected segmentation fault in reAlloc(). At this point, I don't think we need tests for other vector types for the problem this patch is fixing. However, I have two test related patches in pipeline (1) tests for mapvector -- we don't have any unit tests for MapVector I will ensure that those patches cover tests for this scenario as well. |
|
The unit tests added here cover both nullable and non-nullable vectors. |
…ty for computing target memory requirement cc @jacques-n , This is same as #1097 The latter one was closed as I had to rename the branch correctly and use the correct JIRA number. Author: siddharth <siddharth@dremio.com> Closes #1112 from siddharthteotia/ARROW-1533 and squashes the following commits: 4c97be4 [siddharth] ARROW-1533: realloc should consider the existing buffer capacity for computing target memory requirement
…ty for computing target memory requirement cc @jacques-n , This is same as apache#1097 The latter one was closed as I had to rename the branch correctly and use the correct JIRA number. Author: siddharth <siddharth@dremio.com> Closes apache#1112 from siddharthteotia/ARROW-1533 and squashes the following commits: 4c97be4 [siddharth] ARROW-1533: realloc should consider the existing buffer capacity for computing target memory requirement
cc @jacques-n , @StevenMPhillips
Problem description:
We recently encountered a problem when we were trying to add JSON files with complex schema as datasets.
Initially we started with a Float8Vector with default memory allocation of (4096 * 8) 32KB.
Went through several iterations of setSafe() to trigger a realloc() from 32KB to 64KB.
Another round of setSafe() calls to trigger a realloc() from 64KB to 128KB
After that we encountered a BigInt and promoted our vector to UnionVector. This required us to create a UnionVector with BigIntVector and Float8Vector. The latter required us to transfer the Float8Vector we were earlier working with to the Float8Vector inside the Union.
As part of transferTo(), the target Float8Vector got all the ArrowBuf state (capacity, buffer contents) etc transferred from the source vector.
Later, a realloc was triggered on the Float8Vector inside the UnionVector.
The computation inside realloc() to determine the amount of memory to be reallocated goes wrong since it makes the decision based on allocateSizeInBytes – although this vector was created as part of transfer() from 128KB source vector, allocateSizeInBytes is still at the initial/default value of 32KB
We end up allocating a 64KB buffer and attempt to copy 128KB over 64KB and seg fault when invoking setBytes() whereas our assumption was we will be reallocating to 256KB.
There is a wrong assumption in realloc() that allocateSizeInBytes is always equal to data.capacity(). The particular scenario described above exposes where this assumption could go wrong.